Search CORE

24 research outputs found

Listening to the World Improves Speech Command Recognition

Author: McMahan Brian
Rao Delip
Publication venue
Publication date: 23/10/2017
Field of study

We study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accuracies significantly. We also investigate the effect of increased model capacity for transfer learning audio, by first validating known results from the field of Computer Vision of achieving better accuracies with increasingly deeper networks on two audio datasets: UrbanSound8k and the newly released Google Speech Commands dataset. Then we propose a simple multiscale input representation using dilated convolutions and show that it is able to aggregate larger contexts and increase classification performance. Further, the models trained using a combination of transfer learning and multiscale input representations need only 40% of the training data to achieve similar accuracies as a freshly trained model with 100% of the training data. Finally, we demonstrate a positive interaction effect for the multiscale input and transfer learning, making a case for the joint application of the two techniques.Comment: 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Interpretable Style Embeddings via Prompting LLMs

Author: Callison-Burch Chris
Kothary Ansh
McKeown Kathleen
Patel Ajay
Rao Delip
Publication venue
Publication date: 09/10/2023
Field of study

Style representation learning builds content-independent representations of author style in text. Stylometry, the analysis of style in text, is often performed by expert forensic linguists and no large dataset of stylometric annotations exists for training. Current style representation learning uses neural methods to disentangle style from content to create style vectors, however, these approaches result in uninterpretable representations, complicating their usage in downstream applications like authorship attribution where auditing and explainability is critical. In this work, we use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations we call LISA embeddings. We release our synthetic stylometry dataset and our interpretable style models as resources

arXiv.org e-Print Archive

Entity-aspect linking : providing fine-grained semantics of entities in context

Author: Blei David M
Dietz Laura
Frontini Francesca
Li Peng
Nanni Federico
Pantel Patrick
Rao Delip
Ristoski Petar
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Crossref

MAnnheim DOCument Server

Open Knowledge Enrichment for Long-tail Entities

Author: Bast Hannah
Bhagavatula Chandra Sekhar
Bordes Antoine
Dettmers Tim
Gunaratna Kalpa
Hoffart Johannes
Lajus Jonathan
Li Yuan
Lin Yankai
Manning Christopher
Mintz Mike
Pasternack Jeff
Paulheim Heiko
Rao Delip
Reinanda Ridho
Shi Baoxu
Sun Zhiqing
Surdeanu Mihai
Tonon Alberto
Veličković Petar
Wang Xianzhi
Zangerle Eva
Zhang Ningyu
Publication venue
Publication date: 19/02/2020
Field of study

Knowledge bases (KBs) have gradually become a valuable asset for many AI applications. While many current KBs are quite large, they are widely acknowledged as incomplete, especially lacking facts of long-tail entities, e.g., less famous persons. Existing approaches enrich KBs mainly on completing missing links or filling missing values. However, they only tackle a part of the enrichment problem and lack specific considerations regarding long-tail entities. In this paper, we propose a full-fledged approach to knowledge enrichment, which predicts missing properties and infers true facts of long-tail entities from the open Web. Prior knowledge from popular entities is leveraged to improve every enrichment step. Our experiments on the synthetic and real-world datasets and comparison with related work demonstrate the feasibility and superiority of the approach.Comment: Accepted by the 29th International World Wide Web Conference (WWW 2020

arXiv.org e-Print Archive

Crossref

Ranking and semi-supervised classification on large scale graphs using map-reduce

Author: David Yarowsky
Delip Rao
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Label Propagation, a standard algorithm for semi-supervised classification, suffers from scalability issues involving memory and computation when used with largescale graphs from real-world datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the map-reduce framework. In addition to semi-supervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks – lexical relatedness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches.

CiteSeerX

Crossref

Natural language processing with PyTorch: build intelligent language applications using deep learning

Author: McMahan Brian
Rao Delip
Publication venue: O'Reilly Media
Publication date: 01/01/2019
Field of study

CERN Document Server

Learning Efficient Representations for Fake Speech Detection

Author: Rao Delip
Subramani Nishant
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

Synthetic speech or “fake speech” which matches personal vocal traits has become better and cheaper due to advances in deep learning-based speech synthesis and voice conversion approaches. This increased accessibility of synthetic speech systems and the growing misuse of them highlights the critical need to build countermeasures. Furthermore, new synthesis models evolve all the time and the efficacy of previously trained detection models on these unseen attack vectors is poor. In this paper, we focus on: 1) How can we build highly accurate, yet parameter and sample-efficient models for fake speech detection? 2) How can we rapidly adapt detection models to new sources of fake speech? We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters. Our multi-task setting also helps in data-sparse situations, commonplace in adversarial settings. We investigate an alternative approach to the data-sparsity problem using transfer learning and show that it is possible to meet purely supervised detection performance for unseen attack vectors with as little as 6.25% of the training data. This is the first known application of transfer learning in adversarial settings for speech. Finally, we show how well our transfer learning approach adapts in an instance-efficient way to new attack vectors using the Real-Time Voice Cloning toolkit. We exceed the purely supervised detection performance (99.18 F1) with as little as 6.25% of the data

Association for the Advancement of Artificial Intelligence: AAAI Publications